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The  Positional  Set  Processor  (PSP)  is  a 
software  tool  for  manipulating  mathematical  ob- 
jects such  as  sets,  sequences,  ordered  pairs,  etc. 
The  PSP  serves  as  the  underpinning  for  a Data 
Model  Processor  (DMP),  an  experimental  system  for 
emulating  commercial  and  protorype  database 
management  systems.  The  PSP  also  provides  a 
mathematical  basis  for  semantic  specification  and 
interpretation  of  database  operations.  Vvhile  its 
powerful  query  facilities  make  the  PSP  itself  ap- 
pear to  be  a database  managem.ent  system,  it  has  no 
explicit  concept  of  data  definition,  no  access 
control,  no  integrity  control,  ere.  Tne  authors 
prefer  to  viev7  the  PS?  as  a tool  for  specifying 
and  building  database  managem.ent  systems  . This 
paper  reviews  the  mathematical  form.alism,  Posi- 
tional Set  Notation,  and  describes  the  design  of 
the  Positional  Set  Processor. 
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a model  should  have  these  charac- 


It  should  include  specifications  for  data  structures 
(e.g.  trees,  networks,  relations,  etc.)  pi  us  specif- 
ications for  operations  (e.g.  find-record,  join, 
etc.)  on  those  data  structures. 


It  should  account  for  both  structures  used  for  de- 
finitional purposes  and  structures  that  appear  as 
occurrences  in  the  database.  It  may  also  include 
structures  used  for  secondary  indexing  if  the  funda- 
m.ental  operations  depend  on  those  structures. 

It  should  be  documented  in  the  open  literature,  in- 
dependently of  any  vendor's  literature. 
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An  important  concept  in  the  DMP  framework  (v/hich  is 
consistent  v;ith  denotaticnal  semantics  [JONE78])  is  that  the 
concepts  of  a data  model  can  be  expressed  as  set- theoretic 
objects.  Thus,  we  have  chosen  to  express  the  data  struc- 
tures as  "positional  sets"  and  the  operations  as  operations 
on  positional  sets.  This  paper  deals  with  the  mathematical 
formalism  for  expressing  positional  sets,  called  Positional 
Set  Notation  (PSN),  and  a major  component  of  the  DMP,  the 
Positional  Set  Processor  (PSP),  a sophisticated  tool  for 
storing  and  retrieving  positional  sets. 
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SUMMARY  OF  POSITIONAL  SET  NOTATION 


(PSN  ) 


The  philosophy  behind  Positional  Set  Notation  is  simi- 
lar to  the  philosophy  behind  the  Vienna  Definition  Method 
(VDM)--see  [JONE78].  That  is,  the  semantics  of  operations 
(i.e.  functions,  processes,  etc.)  programmed  in  computer 
languages  should  be  defined  in  terms  of  abstractions  that 
are  divorced  from  physical  implementations.  These  abstrac- 
tions are,  in  turn,  defined  in  a mathematical  notation.  The 
VDM  approach  uses  (with  a few  minor  exceptions)  traditional 
mathematical  notation  and  list  structures.  In  this  work,  we 
use  PSN. 

PSN  provides  sufficient  expressive  power  to  define  a 
wide  range  of  data  models.  One  salient  feature  is  the  avai- 
lability of  the  "labeled  tuple"  which  is  the  mathematical 
counterpart  of  an  ADP  "record"  and  simplifies  the  definition 
of  most  data  models.  This  section  provides  an  informal  sum- 
mary of  PSN;  more  detail  may  be  found  in  [HARDSla] . 

The  essence  of  PSN  is  the  recursive  definition  of  the 
positional  set,  S: 

S :=  [x(l)@p(l),  x(2)@p(2),  ...,  x(n)@p(n)] 

where 

* the  p{ i)  are  called  position  identifiers  and  must  be 
atoms,  that  is,  either  numbers  or  character  strings. 
The  p(i)'s  need  not  be  unique. 

* the  x(i)  are  el ements  and  may  be  either  atoms  or 
themselves  posirional  sets. 

The  mathematical  object  S,  above,  is  a mathematical  set  of 
ordered  pairs.  Vie  surround  it  with  square  brackets  rather 
than  braces  for  reasons  discussed  below.  For  convenience, 
the  construction  "x(i)(sp(i)"  is  called  a "duplex".  Duplexes 
may  appear  in  any  order  in  the  written  representation  of  a 
positional  set. 

An  example  of  a positional  set  (abbreviated  as  "p-set") 
in  which  all  of  the  elements  are  atoms  is: 

[JONES0NAME,  25  0AGE,  UMD(?SCHOOL,  BA(?DEGREE] 

Here  the  elements  are  JONES,  25,  UMD,  and  BA.  The 
position_identif ier s (pids)  are  NAME,  AGE,  SCHOOL,  and  DE- 
GREE. An  example  of  a p-set  in  which  some  of  the  elements 
are  themselves  p-sets  is  the  p-set  "STUDENT": 


3- 


[[  [ [JfeMI,  JONES0LAST,  JOHN0FIRST](a#  ](3NAME,  25@AGE, 

[ [UMDC^SCHOOL,  BA0DEGREE, 

[ [CMSC410@COURSE,  B(3GRADE  ]@#  , 

[CMSC420(§COURSE,  A0GRADE  ]@#  ] @CURRIC]@#  ] @ED_HIST](§#  , 

[ [S(3MI , SMITHgLAST,  SCCTTlsFIRST  ] 0NAME,  29  @ AGE  , 

[ [UNC0SCHOOL,  PHD(oDEGREE, 

[ [CMSC6200COURSE, A@GRABE ]@# , 

[CMSC6300COURSE, A0GRADE ]0# , 

[CMSC79S0COURSE, A0GRADE ]@# , 

[CMSC720  0COURSE, B0GRAEE ]0? ] @CURRIC]0?  j 0ED_HIST]@#  J 

The  symbol  4 denotes  the  null  position  identifier.  Given  in 
tabular  form  with  the  position  identifiers  as  column 
headers,  the  p-set  above  looks  like; 

STUDENT 


NAME 

AGE 

ED_HIST 

FIRST 

MI 

LAST 

SCHOOL 

DE  GREE 

CURRIC 

COURSE 

GRADE 

JOHN 

J 

JONES 

25 

UMD 

BA 

CMSC410 

B 

CMSC420 

A 

SCOTT 

S 

SMITH 

29 

UNC 

PHD 

CMSC620 

A 

CMSC630 

A 

CMSC798 

A 

CMSC720 

B 

The  PSP  described  here  has  print  operations  that  display  p- 
sets  in  tabular  form  as  well  as  the  formal  bracketed  nota- 
tion; both  forms  are  shov/n  above. 

PSN  allows  the  data  modeler  to  use  and  m.anipulate  a 
number  of  objects  similar  to  traditional  m.athemat ical  ob- 
jects. The  p— sets  representing  the  classical  set,  the  or- 
dered pair,  the  sequence,  and  the  labeled  tuple  are  shown  in 
the  table  below.  As  mentioned  above,  the  square  brackets 
are  used  to  surround  p-sets  so  the  braces  can  be  used  to 
surround  classical  sets  (i.e.,  sets  in  which  all  the  pids 
are  '*#")  . Ihe  PSP  will  accept  mathematical  objects  in  ei- 
ther the  traditional  notation  or  PSN.  All  traditional  ex- 
pressions are  converted  to  PSN.  The  mathematical  underpin- 
ning is  explained  in  detail  in  [HARBSla] . 
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OBJECT 


TRADITIONAL 

REPRESENTATION 


POSITIONAL  SET 
NOTATION 


CLASSICAL 

SET 


SEQUENCE 


<a( 1 ) ,a ( 2 ) 


t • * • t 


a(n)>  [ a( 1 )$1 , a ( 2 ) 02 , . . . , a ( n) @n] 


ORDERED 


(a,b) 


[ a01 , b02 ] 


PAIR 

2-ELEMENT  <a,b>  Ca01,b02] 

SEQUENCE 

The  labeled  tuple  is  an  important  concept  found  in  most 
data  models  and  the  basis  for  the  relational  model: 

C(  1 ) C(  2 ) ...  C(  n) 

a(l)  a(2)  ...  a(n) 

The  C( i)  are  column-headings  and  a(j)  are  atomic  elements; 
this  is  represented  in  PSN  as : 

[a(l )0C(1 ),a(2)0C(2), . . . ,a(n)0C(n)] 

Following  is  an  example  of  a relational  database  ex- 
pressed first  in  the  tabular  form  of  [ZL0075],  then  in  PSN 
using  the  relational  formulation  given  in  [KARD7S]. 

Tabular  form: 

SALES  DEPARTMENT  ITEM 


STATIONERY 

HOUSEHOLD 

STATIONERY 

COSMETICS 

TOY 

TOY 

TOY 

COSMETICS 

STATIONERY 

HARDWARE 


DISH 

PEN 


PENCIL 

LIPSTICK 

PEN 

PENCIL 

INK 

PERFUME 


PEN 

INK 


TYPE 


ITEM 


COLOR 


SIZE 


DISH 

WHITE 

M 

LIPSTICK 

RED 

L 

PERFUME 

WHITE 

L 

PEN 

GREEN 

S 

PENCIL 

BLUE 

M 

INK 

GREEN 

L 

INK 

BLUE 

S 

PENCIL 

RED 

L 

PENCIL 

BLUE 

L 

In  PSN  form,  the  set  of  relation  occurrences,  REL-OCC,  is: 

[ [SALES® REL_NAME, [ [STATIONERY^DEPARTMENT , DISH@ITEM ] @# , 

[HOUSEHOLDGDEPARTMENT,  PEN(§ITEM  ] §#  , 


[ HARDWARE (2DEPARTM ENT,  INK@ITEM  ] @ RELATION ] 0#  , 

[TYPE0REL_NAME, [ [DISH0I TEM , WHITE 0COLOR, M@S I ZE ] 0# , 

[LIPSTICK0ITEM, RED0COLOR, L0SIZE]0# , 


[PENCIL0ITEM, BLUE0COLOR, L0SIZE]0# ] 0RELATION ] 0# ] 


AN  OVERVIEW  OF  THE  POSITIONAL  SET  PROCESSOR 


3 . 


The  Positional  Set  Processor  (PSP)  is  a collection  of 
37  commands  (see  Table  1).  These  commands  are  operations  on 
positional  sets;  most  commands  take  p-sets  as  input  and  pro- 
duce p-sets  as  output.  They  are  not  intended  to  comprise  a 
convenient  high  level  query  language. 


The  development  of  the  PSP  is  part  of  an  experimental 
project  and  the  collection  of  commands  has  been  modified  as 
the  Data  Model  Processor  (DMP)  design  has  proceeded.  V^e  be- 
gan v/ith  a basic  set  of  Classical  Set  Operations  (such  as 
union,  intersection)  that,  when  applied  to  positional  sets, 
are  analogous  to  the  traditional  set  operations.  The 
Utility  Operations  (such  as  print,  copy)  provide  additional 
capabilities  needed  for  use  in  an  interactive  environment. 


Tlie  Positional  Set  Operations  provide  the  user  with  the 
basic  database  funcrions--retr ieving , updating,  adding,  and 
deleting.  The  operation  "CREATE"  is  the  most  important 
operation  and  is  analogous  to  the  relational  DBMS  operations 
"Retrieve"  in  QUEL  or  "SELECT"  in  SQL.  There  are  also 
operations  to  return  the  (classical)  set  of  elements,  return 
the  (classical)  set  of  po sition_identi f ier s for  a p-set,  and 
distribute  a posi tion_identi f ier  over  a classical  set.  TEM- 
PLATE, POPULATE,  and  CONFORM  were  designed  specifically  for 
use  by  the  various  DMP  roles. 


Since  some  data  models  (e.g.,  CODASYL ) require  the 
manipulation  of  sequences,  the  Sequence  Operations  were  ad- 
ded to  allow  manipulation  of  sequences  as  a special  form  of 
p- sets  . 


We  will  not  attempt  to  fully  describe  all 
mands  in  this  paper,  but  w'ill  concen 
group,  the  Positional  Set  Operations.  A 
tion  of  the  ccminands  ( including  both  s 
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TABLE  1 


PSP  COMMANDS 


Classical  Set  Operations 

IN 

Intersection 

UN 

Union 

RC 

Relative  Complement 

SY 

Symmetric  Difference 

CD 

Cardinal ity 

Positional  Set  Operations 

CR 

Create 

RG 

Range 

FR 

Freeze 

ST 

Start 

RL 

Release 

TH 

Thaw 

Extended  Positional  Set  Operation 

BR 

Break 

RK 

Recur  se 

TMP 

Tempi ate 

POP 

Populate 

CNF 

Conform 

Sequence  Ope 

rations 

SQ  CAT 

Concatenate  sequence 

SQ  EXC 

Extract  by  contents 

SQ  EXI 

Extract:  by  index 

SC  FLC 

Flush  by  contents 

SQ  lAC 

Insert  after  content 

SQ  lAI 

Index  after  index 

SQ  IBC 

Insert  before  conten 

SQ_RPC 

Replace  element 

Utility  Oper 

ations  - Non-predicat 

CP 

Copy 

DE 

Delete 

NS 

Null  set 

. PT 

Put 

EN 

Enter 

PN 

Print  names 

PS 

Print  set 

Utility  Oper 

ations  - Predicates 

CS 

Classical  set 

EC 

Equal 

I SIN 

Is  in 

NL 

Null 

NN 

Not  null 

SB 

Sub  set 
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POSITIONAL  SET  OPERATIONS 


4 . 


The  Positional  Set  Operations  (CREATE,  RANGE,  FREEZE, 
START,  RELEASE,  and  THAW)  allow  traversal  of  p-sets  and  gen- 
eration of  new  p-sets  based  on  a specified  condition.  In 
this  section  we  will  discuss  "range  variables"  and  the  "dot 
operator"  and  will  then  describe  the  commands. 

Rang e Variables 

Range  variables  ( rv)  allow  access  to  p-sets.  The  name 
"range  variable"  is  taken  from  ALPHA  [C0DD71 ] and  QUEL 
[STON76],  but  the  concept  is  more  general  here.  Unlike  (nor- 
malized) relations,  p-sets  may  be  nested  to  any  depth.  The 
range  variables  provide  a mechanism  for  accessing  nested 
structures  as  well  as  first  normal  form  relations. 

Each  rv  is  associated  with  a particular  p-set.  That 
is,  the  declaration  of  a range  variable  enables  that  vari- 
able to  take  on,  one- at- a- time  and  in  no  particular  order, 
the  values  of  the  elements  of  a p-set.  If  X is  a variable 
ranging  over  the  p-set  P,  then  "print  all  X"  would  print  all 
the  elements  of  P.  The  qualification,  "X  = <value>"  is  TRUE 
iff  9X  (X  P AND  X = <value>).  Thus,  "print  all  X where 
<condition>"  would  print  those  elements  of  P which  satisfy 
< condition>  . 

The  association  between  the  p-set  and  the  rv  is  esta- 
blished with  the  RANGE  command  and  (unlike  QUEL)  remains  in 
effect  until  cleared  by  the  RL  or  ST  command  ( see  below) . 

Dot  Operator 


The  dot  operator,  often  used  without  formal  definition, 
has  been  a mainstay  for  several  relational  languages  (e.g. 
QUEL,  SEQUEL2),  as  it  is  quite  useful  in  designing  query 
languages  with  substantial  expressive  power.  It  is  also  a 
rotational  device  to  simultaneously  allow  traversal  of  a set 
and  access  to  (elements  of)  tuples  within  the  set.  Since 
both  the  tuples  and  the  sets  can  be  represented  as  p-sets,  a 
more  precise  definition  of  the  dot  operator  expression  can 
be  given  in  PSN. 

Let  X range  over  the  p-set  P.  At  any  particular  time, 
X is  a pseudon^^m  for  a tuple  T that  is  an  element  of  P.  The 
construction:  "X.ATT"  refers  to  the  element  at  pid  ATT  con- 
tained in  the  tuple  T.  "Print  all  X.ATT"  would  print  all 
the  values,  Y,  such  that  Y@ATT  is  an  element  of  an  elem.ent 
of  P.  That  is: 

"X.ATT  = <value>"  is  TRUE  iff 
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dY  (Y(sATT  i T AND  T ■(  P AND  Y = <value>). 

For  example,  consider  the  <condition>; 

X.AGE  = ■ 21 ' 

as  evaluated  for  a particular  value  of  X ( ie . , some  elemen 
of  P).  The  <condition>  is  TRUE  if  "21@AGE"  is  an  element  o 
X ( ie . , an  element  of  an  element  of  P).  Similarly,  the 
< cond ition> : 

X.AGE  > '30' 

returns  TRUE  if  there  exists  an  M such  that  "M0AGE"  is  an 
element  of  X and  M > '30'. 

Command s on  Range  Variables 

The  START  (ST)  command  initializes  the  global  range 
variable  by  deleting  all  range  variable  associations. 

The  RANGE  (RG)  command  declares  and  associates  a range 
variable  with  a p-set.  It  has  two  forms; 

RG  <rv>  IS  <p-set  name> 
or 

RG  <rvl>  IS  <rv2>  . <pid> 

For  example,  the  command: 

RG  X IS  STUDENT 

declares  "X"  a range  variable  and  associates  it  with  the 

p-set  named  "STUDENT".  The  rv  "X"  ranges  over  the  duplexes 
of  the  p-set  "STUDENT". 

The  command  form  to  declare  and  associate  an  rv  with  a 
p-set  element  (that  is  itself  a p-set)  is  somewhat  more  com- 
plex. The  dot  operator  indicates  that  the  <pid>  is  at 

the  next  level  of  nesting  within  the  p-set  or  p-set  element 
associated  with  < rv2 > . For  example,  the  command; 

RG  Y IS  X.ED_HIST 

declares  "Y " a range  variable  and  associates  it  with  the 

element  X w'hich  has  pid  "ED_HIST".  "ED_HIST"  is  at  the  next 

level  of  nesting  within  the  p-set  over  which  "X"  ranges, 

that  is,  "STUDENT". 
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Declaring  one  range  variable  in  terms  of  another  im- 
plies a relationship  between  range  variables  that  is  impor- 
tant for  the  semantics  of  the  CREATE  command;  this  is  an 
ancestor  relationship.  If  X is  an  rv  and  appears  in  an  RG 
statement : 

RG  X IS  <p-set  name> 

then  X has  no  ancestors.  If  Z is  an  rv  and  appears  in  a 
statement : 

RG  Z IS  Y.<pid> 

then  Y is  both  an  ancestor  and  the  parent  of  Z.,  All  of  Y's 
ancestors  are  also  ancestors  of  Z.  In  the  example: 

RG  X IS  STUDENT 
RG  Y IS  X.ED_HIST 
RG  Z IS  Y.CURRIC 

"X " has  no  ancestors;  "X"  is  an  ancestor  of  "Y"  and  "Z"  and 
the  parent  of  "Y";  "Y " is  an  ancestor  and  the  parent  of  "Z". 

The  RELEASE  (RL)  command,  which  has  the  form: 

RL  <rv> 

removes  the  association  between  the  specified  rv  and  a p- 
set . 

The  FREEZE  ( FR ) command  binds  the  rv  to  one  duplex  of 
the  p-set  for  which  the  specified  <condition>  is  true.  This 
means  that  the  rv  no  longer  ranges  over  the  entire  p-set, 
but  indicates  only  one  duplex.  The  form  of  the  FREEZE  com- 
mand is: 

FR  <rv>  WHERE  <condition> 

The  rv  must  have  been  previously  associated  with  a p-set  us- 
ing an  RG  command.  The  <condition>  is  a pr ed icate-- an  ex- 
pression involving  comparison  operators,  boolean  operators, 
and  parentheses,  that  returns  "true"  or  "false".  For  exam- 
ple, the  command: 

FR  X WHERE  "X.AGE  = *29’" 

indicates  that  the  rv  "X"  ranges  over  the  duplexes  in  the 
p-set  with  v/hich  it  is  associated.  Wlien  a duplex  that  sa- 
tisfies the  condition  (that  the  value  of  the  elem.ent  associ- 
ated with  the  pid  "AGE"  is  '29')  is  found,  the  value  of  the 
rv  "X"  is  then  frozen  at  (i.e.  bound  to)  that  duplex.  If 
the  condition  is  never  satisfied,  the  rv  remains  free. 
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If  an  rv  is  frozen  with  the  FREEZE  command,  all  of  its 
ancestors  will  be  frozen  also.  For  example,  if  the  unfrozen 
rv ' s "X",  "Y",  and  "Z"  are  associated  with  the  p-sets  "STU- 
DENT”, "X.ED_HIST",  and  "Y.CURRIC",  respectively,  the  com- 
m^and  ; 

FR  Z WHERE  " Z . COURSE= ' CMSC798 ' " 

will  not  only  freeze  "Z"  at  a duplex  in  the  p—set  element 
"Y.CURRIC"  v/nich  has  'CMSC798'  as  the  value  of  the  element 
associated  with  the  pid  "COURSE",  but  will  also  freeze  "Y " 
and  "X"  at  the  duplexes  which  are  their  values  when  the  con- 
dition on  "Z"  is  satisfied. 

Note  that  when  more  than  one  duplex  satisfies  the  con- 
dition, the  choice  of  a duplex  is  impl  smientation-dependent 
and  cannot  be  anticipated  or  controlled  by  the  user.  The 
capability  to  traverse  the  p-set,  successively  freezing  on 
each  duplex  that  satisfies  the  condition,  is  an  option. 

The  THAW  (TH)  ccm.mand , which  removes  the  FREEZE  bind- 
ing, has  the  form: 

TH  <rv> 

\^7hen  an  rv  is  thawed,  any  frozen  descendants  are  also 
thawed;  ancestors  are  not  affected.  Thus  in  general,  the 
"TH  X"  command  is  not  the  reciprocal  of  the  " FR  X"  command. 

The  CREATE  Command 

The  PSP  CREATE  ( CR ) command  is  analogous  to  SELECT  in 
the  SEQUEL  language  of  SYSTEM  R [ASTR76]  or  RETRIEVE  in  the 
QUEL  language  of  INGRES  [STON76].  How’ever,  unlike  SEQUEL 
and  QUEL,  the  PSP  allov/s  nested  relations.  In  relational 
jargon,  this  means  that  first  normal  form  (INF)  is  not  re- 
quired. The  CREATE  command  builds  a new  p-set  having  (1) 
the  specified  attributes  from  the  original  p-set(s)  and  (2) 
the  duplexes  which  satisfy  a given  condition.  It  has  the 
form : 

CR  P V7ITH  < attribute- list>  WHERE  <condition> 

The  <condition>  has  the  same  form  and  meaning  for  the  CREATE 
command  as  for  the  FREEZE  command.  'The  < attribute- list>  de- 
fines the  structure  of  the  new  p-set  by  identifying  its 
pids.  Syntactically,  it  is  a list  of  < terra> ' s and 
< ass ig nmen t> ' s . A <term>  specifies  which  pids  are  to  be 
taken  from  which  p-sets;  the  basic  form  is: 

" < rv> . < pid>  " 
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The  < assignment>  has  the  form: 


"<pid>  :=  <phrase>  " 

This  allows  the  user  to  specify  a pid  for  the  nev/  p-set 
which  is  not  related  to  a previously  defined  p-set  and  to 
assign  a value  for  its  element.  The  <phrasc>  may  be  an  ar- 
ithmetic expression,  a value,  or  a basic  <term>.  To  rename 
the  pid  in  the  new  p-set,  this  form  of  the  <assignment>  is 
used  ; 

"<pid>  :=  <rv>.<pid>" 

An  example  of  the  CR  command  using  our  sample  database 

is : 

RG  X IS  STUDENT 
RG  Y IS  X.ED_HIST 
RG  Z IS  Y.CURRIC 

CR  NEV'JSET  WITH  "X  . NAME , X . AGE  , CLASS  :=  '1981', 

TRANSCRIPT  :=  Y.CURRIC"  WHERE  "Y. DEGREE  = 'PHD'  i 
(Z. COURSE  = 'CMSC498'  & Z . GRADE  = 'A')" 

In  the  "WHERE"  clause,  "1  " is  the  logical  "OR"  operator  and 
"Sc"  is  the  logical  "AND"  operator.  The  p-set  "NEWSET"  will 
have  the  pids  "NAME",  "AGE",  "CLASS",  and  "TRANSCRIPT". 
"NAME"  and"AGE"  will  be  the  same  as  in  "STUDENT",  and  "TRAN- 
SCRIPT" will  be  the  same  as  "CURRIC"  in  "STUDENT"  (only  the 
pid  name  is  different) . Each  duplex  will  have  the  new  pid 
"CLASS"  v/hich  has  the  element  value  '1981'.  The  duplexes 
chosen  from  "STUDENT"  are  those  v/hich  have  'PHD'  as  the 
value  of  "DEGREE"  or  that  have  a duplex  in  "CURRIC"  wit 
'CMSC498'  as  the  value  of  "COURSE"  and  'A'  as  the  value  o 
"GRADE".  The  ansv/er  to  this  query,  given  the  p— set  "STU- 
DENT" of  section  2,  would  be: 

PSN  form: 

[[[S@MI,  SMITH @LAST,  SCOTT @FI RST  ] @NAME,  29(?AGE,  1981t2ciASS, 

[ [CMSC6200COURSE,  A0GRADE  ](2#  , 

[CMSC63G&COURSE,  AiSGRADE  ]y#  , 

[CMSC79S(§COURSE,  AGGRADE  ]e#  , 

[CMSC720(3COURSE,  DEGRADE  ]0#  J ^TRANSCRIPT  ] 0#  ] 
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Tabular  form: 


NEWSET 


NAME  AGE  CLASS 


LAST  FIRST  MI 


TRANSCRIPT 


COURSE  GRADE 


SMITH  SCOTT  S 29 


1981  CMSC620  A 

CMSC630  A 
CMSC798  A 
CMSC720  B 


Notice  the  similarity  in  the  QUEL  equivalent  of  this 
CREATE  statement: 

RETRIEVE  INTO  NEWSET 

(AGE=X.  AGE  , M\ME=X.  NAME,  TRANSCRIPT  =Y  . CUT.RIC  , CIi\SS=  "1  981  ") 

WHERE  Y. DEGREE="PHD"  OR 

( Z.COURSE="CMSC498 " AND  Z . GRADE = " A" ) 

Since  the  PSP  was  designed  to  contain  the  functionality 
of  the  relational  operators,  it  is  appropriate  to  discuss 
how  the  relational  operarors  (i.e.  PROJECT,  SELECT,  and 
JOIN)  as  well  as  p-set  operators  can  be  expressed  in  terms 
of  the  CREATE  command.  A_n  informal  discussion  and  some  ex- 
amples are  given  here. 

A PROJECT  can  be  represented  as  a CR  with  no'  <condi- 
tion> , with  the  effecc  that  each  duplex  of  the  original  p- 
set  is  selected,  but  only  the  pids  specified  in  the 
attribute- list  are  copied.  For  example, 

RG  X IS  STUDENT 

CR  AGES  WITH  "X.AGE" 

creates  the  p-set  "AGES"  which  is  a classical  set  containing 
all  the  elements  with  pid  "AGE"  in  the  p-set  "STUDENT". 


PS  N f o rm  : 


[[25  (SAGE  J (3#  , [ 2 9 (SAGE  ] 0 # ] 
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Tabular  form: 


AGES 


AGE 


25 

29 


A slightly  more  complicated  example  is  a PROJECT  involving  a 
nested  pid : 

RG  X IS  STUDENT 

RG  Y IS  X.ED__HIST 

CR  SCHOOLS  WITH  "Y. SCHOOL" 


This  creates  the  p-set 
all  elements  with  the 
the  p-set  "STUDENT", 
eliminated,  since  a p- 


"SCHOOLS",  a classical  set 
pid  "SCHOOL"  at  the  "ED_HIST 
Any  duplicate  duplexes  are, 
set  is  a mathematical  set. 


containina 
" level  o 
of  course. 


PSN  form: 

[[UMD(5SCH00L]@#  , [UNC 0 SCHOOL  ]@^tt  ] 


Tabular  form: 


SCHOOLS 


SCHOOL 


UMD 

UNC 


A SELECT  can  be  expressed  as  a CR  which  copies  each  du- 
plex of  a p-set  that  satisfies  a condition.  For  example, 

RG  X IS  STUDENT 
RG  Y IS  X.ED_HIST 

CR  DOCTORS  WITH  "X  . NAME , X . AGE  , X . ED_HI ST " VvHERE 
"Y. DEGREE='PHD ' " 

This  creates  a p-set  "DOCTORS"  having  the  same  srructure  as 
the  original  p-set  "STUDENT"  but  a subset  of  the  original 
duplexes-- those  that  satisfy  the  condition  that  'PHD'  is  the 
value  of  the  element  with  pid  "DEGREE"  at  the  "ED_HIST"  lev- 
el . 
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PSN  form: 


C[[S@MI,  SMITH@LAST,  SCOTT (3FIRST  ] 0NAME,  29@AGE, 

[ [UNC@SCHOOL,  PHD13DEGREE, 

[ [CMSC620C^COURSE,  A0GRADE  ]@#  , 

[CMSC6300COURSE, A0GRADE , 

[CMSC79S0COURSE, A0GRADE ]0# , 

[CMSC7200COURSE,  B0GRADEJ0#  ]@CURRIC]@#  ]0ED_HIST]0^^  ] 
Tabular  form: 


DOCTORS 


NAME  AGE 


FIRST  MI  LAST 


ED  HIST 


SCPiOOL  DEGREE  CURRIC 


COURSE  GRADE 


SCOTT  S SMITH  29  UNC  PHD  CMSC620  A 

CMSC630  A 
CMSC798  A 
CMSC720  B 


A JOIN  can  be  expressed  as  a CR  which  concatenates  du- 
plexes from  two  p-sets,  based  on  a condition  that  links  the 
p-sets  through  some  pids  that  presumably  have  the  same 
domain.  For  example,  given  the  p-set  "ADDRESS": 


PSN  form: 

[[ [S0MI, SMITK0LAST, SCOTT0FIRST ] 0PERSON , RALE IGH 0CITY , NC@STATE]0# , 
[ [J0MI, JONES0LAST, JOHN0FIRST]0PERSON, PITTSBURGH 0CITY, PA0STATE]@ 

Tabular  form: 


ADDRESS 


PERSON  CITY  STATE 


FIRST  MI  LAST 


SCOTT  S SMITH  RALEIGH  NC 

JOHN  J JONES  PITTSBURGH  PA 
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Tb.e  PSP  cominands: 


RG  X IS  STUDENT 
RG  S IS  ADDRESS 

CR  COMPLETE  VJITH  "X  . NAME , X . AGE  , X . ED_H  I ST  , S . C ITY,  S . STATE  " 
WHERE  "X. NAME=S . PERSON" 

create  the  p-set  "COMPLETE"  which  consists  of  all  concatena- 
tions of  a duplex  from  "STUDENT"  and  a duplex  from  "A.DDRESS" 
containing  the  same  value  of  the  elements  corresponding  to 
pid  "NAME"  in  "STUDENT"  and  pid  "PERSON"  in  "ADDRESS".  The 
answer  to  this  query,  given  the  example  "STUDENT"  database, 
is  : 

PSN  form: 

[[  [J@MI,  JONES0LAST,  JOHN0FIRST](3NAME,  25  0AGE, 

[ [UMD0SCHOOL, BA0DEGREE, 

[ [CMSC4100COURSE, B0GRADE ]@# , 

[ CMSC420  0COURSE , A0GRADE ] 0# ] 0CURRIC ] 0# ] 0ED_H 1ST , 
PITTSBURGH 0CITY, PA0STATE]0# , 

[ [S0MI, SMITH0LAST, SCOTT 0FIRST ] 0NAME , 290AGE, 

[ [UNC0SCHOOL, PHD0DEGREE, 

[ [CMSC6200COURSE, A0GRADE ]0# , 

[CMSC6300COURSE, A0GRADE ]0# , 

[CMSC7980COURSE, A0GRADE ]0# , 

[CMSC720  0COURSE, B0GRADE  J0# ] 0CURRIC]0# ] 0ED_HIST, 
RALEIGH0CITY, NC0STATE]0# ] 

Tabular  form: 


COMPLETE 


NAME 

AGE 

ED_ 

_HIST 

CITY  STATE 

FIRST 

MI 

LAST 

SCHOOL 

DEGREE 

CURRIC 

COURSE 

GRADE 

JOHN 

J 

JONES 

25 

UMD 

BA 

CMSC410 

B 

PITTSBURGH  PA 

CMSC420 

A 

SCOTT 

S 

SMITH 

29 

UNC 

PHD 

CMSC620 

A 

RALEIGH  NC 

CMSC630 

A 

CMSC798 

A 

CMSC720 

B 
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CONCLUDING  REMARI^S 


The  first  version  of  the  PSP  is  operational  in  an  ex- 
perimental environment.  The  system  is  quite  slow  and  only 
works  with  very  small  databases;  so  far,  little  v/ork  has 
been  done  on  performance  improvement.  This  section 
discusses  the  success  of  the  design  and  our  plans  for  the 
future . 

The  PS  P and  Data  Models 

As  mentioned  earlier,  the  PSP  is  intended  to  be  used  as 
a tool  for  studying  DBMS’s,  not  as  a DBMS  itself.  It  has  no 
data  definition  facility,  nor  any  restrictions  on  database 
design,  as  would  exist  in  a commercial  DBMS  that  implemients 
a particular  data  m^odel  . However,  unlike  most  tools,  the 
PSP  has  a tremendously  powerful  query  facility.  Consequent- 
ly, iz.  is  useful  to  compare  the  expressive  capabilities  of 
the  PSP  to  those  of  various  data  models. 

Th  ■ query  language  of  the  PSP  may  be  viewed  as  an  ex- 
tension of  relational  query  languages.  Range  variables, 
similar  to  those  in  QUEL,  enable  referencing  of  attributes 
in  a relation.  The  basic  data  retrieval  and  manipulation 
command,  CREATE,  is  analogous  to  SELECT  in  SYSTEM  R.  Howev- 
er, the  PSP  has  somie  important  features  that  extend  its  ex- 
pressive pov/er  beyond  that  of  currenr  relational  systems. 

* It  allows  nesting  of  all  kinds  of  m.athematical  ob- 
jects within  one  another:  sets,  sequences,  rela- 

tions, or  any  other  structures  definable  in  PSN. 

* There  are  no  intrinsic  normal  forms.  Normal  forms 
are  restrictions  on  the  original  mathematical  con- 
cept of  a relation.  The  proliferation  of  normal 
forms  as  anomalies  or  limitations  are  discovered  in 
each  new  one  suggesrs  that  normalization  is  not 
v/orth  the  sacrifice  of  the  power  of  the  mathematical 
definition  of  "relation". 

* The  database  may  contain  nested  hierarchical  struc- 
tures that  can  be  referenced  with  r el ational- like 
command  s . 

* Nodes  from  different  subtrees  can  be  specified  in 
the  same  query. 

Thus,  the  PSP  combines  concepts  from  several  data  models  and 
serves  as  a powerful  underpinning  for  the  Bata  Model  Proces- 
sor and  as  a basis  for  the  study  of  future  database  manage- 
ment systems. 
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Future  Directions 


Project  efforts  in  the  next  year  are  directed  toward 
applications  of  the  Data  Model  Processor  to  such  areas  as 
data  model  mappings,  commercial  DBMS  validation,  evaluation 
of  data  model  specifications,  and  query  language  transla- 
tion. A paper  on  the  DMP  [KOLL82]  is  forthcoming.  Work 
will  also  be  done  on  improving  the  efficiency  and  expressive 
power  of  the  PSP,  so  that  it  will  be  a more  effective  and 
efficient  tool.  Some  of  our  plans  are  discussed  below. 

Another  version  of  the  PSP,  based  on  the  same  users 
manual,  but  using  an  integer  set  processor  (ISP),  may  be  im- 
plemented. This  would  be  similar  to  a previous  implementa- 
tion described  in  [HARD76].  The  ISP  has  the  potential  to 
perform  set  operations  rapidly  on  rather  large  classical 
sets  of  positive  integers  represented  by  bit  strings.  A re- 
vised bit- string  technique  has  been  developed  by  Gallagher 
[GALL81].  By  mapping  p-sets  onto  the  integers  (using  an 
element  table  and  the  Cauchy-Cantor  technique),  the  power  of 
an  ISP  can  be  used  to  implement  p-sets. 

Measurement  and  predictive  performance  modeling  tech- 
niques may  be  used  to  direct  the  further  developm.ent  of  the 
prototype  PSP.  A new  predictive  model  that  handles  both  re- 
cursive routines  and  multiple  concurrent  processes  may  be 
designed,  implemented,  and  used  to  gather  data  on  the  execu- 
tion of  the  major  routines  of  the  PSP.  Eased  on  the 
results,  a model,  similar  to  [DEUT78],  that  predicts  perfor- 
mance for  the  prototype  may  be  attempted . The  model  v/ould 
be  validated  by  checking  predictions  against  actual  perfor- 
mance in  test  cases,  then  the  model  would  be  used  to  fore- 
cast performance  in  an  operational  environment.  Modifica- 
tion and  optimization  efforts  would  concentrate  on  the  PS? 
routines  that  the  model  shows  to  be  problem  areas. 


The  expressive  power  of  the  PSP  will 
modifications  to  some  of  the  commands  and 
tional  features  for  creating  and  accessing 
will  be  specified. 


be  increased  by 
operators.  A*ddi- 
nested  structures 


We  will  be  examining  the  relationship  between  the  PSP 
and  other  work  on  semantics,  abstract  data  modeling,  and 
functional  languages.  We  are  trying  to  incorporate  some 
ideas  from  functional  languages  [BACK80]  but  this  is  just 
beg inning . 
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